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(57) Abstract 

A computer-based electronic document and/or paper-based document management application program. The program provides an 
efficient way to automatically import, index, categorize, store, search, retrieve, manipulate and archive electronic documents. The program 
is also c^)abie of managing documents regardless of document type or document format. 
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COMPUTER-BASED DOCUMENT MANAGEMENT SYSTEM 

BACKGROUND 

The present invention relates to computer-based document management 

systems. More particularly, the present invention relates to a conq)uter-based 
5 document management system that has the a5)ability of importing, organizing, 

browsing, searching, and viewing paper-based documents and electronic 

documents of any type or format from various sources. 

In today's business environment, most businesses, from small businesses to 

large corporate entities, organize and maintain a tremendous amount of 
10 information, particularly information m the form of paper-based documents and 

electronic documents. The task of organizing and maintaining such a large number 

of documents, can, and typically is, a time consuming and costly matter. 

In response, the computer industry, particularly the con:q)uter software 

industry, offers a number of computer application programs designed to help 
15 mitigate this problem. Some of these computer application programs work in 

conjunction with optical scanners to automatically import paper-based documents 
^ mto a host conoputer. Other application programs are dkected more specifically at 
»5 providing electronic file numagement services for existmg electronic documents. 

Some of the more advanced computer application programs attenq)t to integrate a 
20j number of different capabilities into a smgle application program. Among the 

mpabilitics that some of the more advaticed pfograms ^pg^de^ arc automate d" 

document importing, storage, manipulation, retrieval, indexing, and document 

annotation. 

However, despite the many features already offered by existing software 
25 products, there is still a need to improve the efficiency of these products. This is 
especially true with respect to the way in which these prior products import, store, 
and otherwise organize electronic documents within a document collection. For 
exan^le, current products do not provide an efficient way in which to import 
documents into a single document collecdbn, nor do they provide an efficient way 
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in which to cootinuously and automatically update the document collection as new 
documents are added, and as existmg documents are modified and/or deleted. 
Existing software products also do not efficiently manage document collections 
consisting of documents that exhibit one of many different data formats. In 
5 addition, existing products do not efficiently store documents in memory, 

especially wherein documents may appear to be stored in and/or linked to more 
than one location and/or document category. Consequently, matmgiTig a large 
document collection can be a formidable task even with the assistance of various 
software products presently on the market. Therefore, the ability to automatically 
id import, store, organize and manipulate the document collection with minimal user 
interaction, and to do so m a most memory efficient way, would be most deskable. 

SUMMARY 

The present invention is directed to a method for managing documents in a 
computer-based system. The present invention provides a niunber of 
15 improvements over prior methods, particxilarly, the way in which the present 

invention indexes, categorizes and stores a wide.range of documents and document 
types in its electronic database. 

Accordingly, it is an object of the present invention to standardize the way 
in which document information is maintained regardless of document type or 
20 document format. 

Itis-another-object-ofth e present inventi onrto' a u toma t ical ly1ffitexioaa~ — 
categorize a large quantity of paper-based and electronic documents and document 
types. 

It is another object of the present invention to efficiently and automatically 
25 store, browse, and view a large quantity of paper-based and electronic documents. 

It is still another object of the present invention to automatically modify 
and/or manipulate a large quantity of paper-based and electronic documents. 

In accordance with one aspect of the present invention, the foregomg and 
other objects are achieved by a method of managing a document collection in a 

-2- 
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coiiq)uter system that involves importmg a document into the computer system; 
then storing the document in a memory location; automatically extractmg attribute 
data from the document; and generating a data structure for the document. 
Moreover, the data structure contains the attribute data in a standardized format 
5 regardless of document type or document format. 

In accordance with another aspect of the present invention, the foregomg 
and other objects are achieved by a conq>uter-readable storage medhmi that has 
stored therein a program which is capable of inq>orting a document into a 
computer-based system; storing the document in memory; automatically extracting 
10 attribute data from the document; and generating a data structure corresponding to 
the document. The data structure generated by the program contains the extracted 
attribute data in a standardi ze d format regardless of document type or document 
format. 



BRIEF DESCRIPTION OF THE DRAWINGS 
15 The objects and advantages of the invention will be understood by reading 

the following detailed description in conjunction with the drawings, in whidi: 

FIG. 1 A is a diagram of a general purpose computer wMch could be used 
to inq)lement the present invention; 

FIG. IB is a diagram illustrating some of the features and utilities 
20 eiiq[>loyed by the present invention; 

— : — — FIG7^A-4s-an-exenq)lary:representationnofan:5^^ 

document; 

FIG. 2B is an exemplary representation of an STG file associated with a 
clipped document; 

25 FIG. 3 depicts the hierarchical organization of an exemplary document 

collection in accordance with the present invention; 

FIG. 4 is a screen display of the user interface associated with the change 
notification utility; 

FIG. 5 is a screen di^lay of a first scanner preferences user interface; 

-3- 
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FIG. 6 is a screen display of a second scanner preferences user interface; 

FIG. 7 is a screen display of a third scanner preferences user interface; 

FIG. 8 is a screen display of a fourth scanner preferences user inter&ce; 

FIG. 9 is a screen display of the scanner control interface; 

FIG. 10 is a screen display of a first Browser utility user interface; 

FIG. 11 is a screen display of a second Browser utility user interface; 

FIG. 12 is a screen display of the second Browser utility user interface with 
a customizable application toolbar; 

FIG. 13 is a screen display of a thu:d Browser utility user interfiice 
containmg icons representing transitional documents; ' 

FIG. 14 illustrates the user interface for conducting a basic document 

search; 

FIG. 15 illustrates the user interface for conducting an advanced document 

search; 

HG. 16 is a screen display of the document viewing utility user interface; 
I HGs. 17A-D illustrate a drag and drop operation in conjunction with the 

creation of a clipped document; 

FIG. ISisascreendisplay of die file helper user inter&ce; 
-^r FIG. 19 is a screen display of a secondary file helper utiliQr user inter&ce; 

r FIG. 20 is a screen display of the directory monitor user interface; 

HG. 21 is a screen display of the task manager user interfsure; and 

HGT^ins a screen-display-oflhe system taskijariUiKmtii^gihB^BSIc^ 

manager icon. 

DETAILED DESCRIPTION 
The present invention involves a system and/or method for managing 
electronic documents in a general purpose computer, such as the general purpose 
conq)uter 100 illustrated in FIG. lA. The present mvention further includes a 
system and/or method for inq)orting electronic documents and electronic 
rq>resentations of paper-based documents from any number of different sources. 
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For example, the present invention is capable of inq)orting electronic 
representations of paper-based documents from a scanner 105, word processing 
documents from an internal memory such as a RAM (not shown) or an external 
memory 115 (e.g., a hard drive), e-mail from an internet connection 120, or a 

5 document containing graphical image data from a server 125 supporting a local- 
area network, to which the general purpose computer 100 is connected. Once a 
document has been imported, the present invention employs a system and/or 
method for automatically categorizing, indexing, browsing, viewing and otherwise 
manipulating die document, along with each of the other documents contamed in 

10 what is herein referred to as the document collection. As one skilled m the art will 
readily appreciate, the present invention can be implemented in software, using 
standard programming methods and techniques which are well known in the art. 

The present invention employs a number of core features 150 as well as a 
number of document management utilities as illustrated in FIG. IB, The core 

15 features 150 refer to certain attributes or characteristics that the present invention 
employs and/or executes in the^background to support the various document 
management utilities. For the purpose of simplicity, the following description of 
the present invention is divided .into the various core features ISO and document 
management utilities. The order in which each invention feature and/or utility is 

20 presented herein below is not intended to limit die present invention in any way. 
Rather, the scope of the invention is given by the q;>pended claims. 

DATA STORAGE (STG) FILES 

The first core feature 150 of the present invention is a unique data storage 
(STG) structure referred to herein as an STG file. The present invention maintains 
25 an STG file for each document in the document collection. A new STG file is 
created for each new document, and an existing STG file may be updated if the 
corresponding document is modified. 
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Each STG file contains a number of standardized data fields. This provides 
a way to maintain various attribute data and other infonnation for a given 
document in a common, standardized format regardless of the document's type 
(e.g., text document versus image document) or the document's format (e.g., 
5 JPEG versus HTML). In a preferred embodiment, all STG files are stored in a 
common disk directory. 

FIG. 2A represents an exenq)lary STG file 200 along with some of the data 
fields that may be contained therein. For exanq)le, STG file 200 may mclude a 
data field 205 which contains a file name, e.g., "OOLSTG*, to identify the 

10 corresponding STG file 200. The STG file 200 may also mclude a data field 210 
and a data field 215 which reflects the memory location of the corresponding 
document and a bit map defining a representative thumbnail respectively. The 
STG file 200 may also contain a data field 220 which reflects the raw text 
associated with the corresponding document. The raw text data is primarily used 

15 for mdexing purposes. Indexing is described in greater detail below. In addition, 
the STG file 200 is likely to contain a number of other data fields (not shown) for 
such attributes as document author, publishing date, word count, smnotations, 
and/or key words if the document belongs to a particular category.^ Categories and 
categorization of documents are also explained in detail below. If-ihe document 

20 corresponding to the STG file is an image document, data fields may be included 
for such attributes as image type (e.g., color, black and white, or gpray scale), 

image^unension7and/t)rimage-meta=lext-withtexti)05i ^ 

An STG file also exists for each clipped document stored by general 
purpose computer 100. A clipped document is a special type of compound 

25 document data structure. Typically, a clipped document incorporates a number of 
related or component documents in a particular document order similar to 
attachmg a number of physical documents together with a paper clip. When a 
clipped document is created, an STG file is generated. Unlike STG files 
associated with individual documents, an STG file associated with a clipped 

30 document includes a data field having its own file name, e.g. , "002.STG" and a 
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number of additional data fields which contain the identity of the STG files 
corresponding to the con^nent documents. FIG. 2B shows an exemplary STG 
file 250 that is associated with a clipped document. As illustrated, STG file 250 
links the clipped document with four component documents, wherein the STG files 
5 that correspond with the four component documents are identified by their file 
names as foUows: lOO.STG, lOl.STG, 21LSTG and 084.STG. Clipped 
documents are described in greater detail below. 

Aside from creating an STG file for each new document and each new 
clipped document, an existing STG file may be updated if the corresponding 
10 document or clipped document is edited or modified in some way. For exanq>le, if 
a user modifies an existing word processing document, upon saving the modified 
version of the document, the correspondmg STG file is updated, if necessary, 
particularly die text data field 120. 

ORGANIZATION OF THE DOCUMENT COLLECTION 

15 B In a preferred embodiment, the document collection is organized into a 
hierarchy of files, clipped documents, and electronic folders, wherein electronic 
foMers may, in turn, contain additional files, clq>ped documents and nested 
fqlders. The data that defines how the hierarchy of files, clipped documents and 
~ ele^nic foldeis are organized wiA lespect to each other is maintained in a 

^0 con^)ound-data-structure-referred-t o he r ein as Ihe docum Hirccril ec t ion organiza dCT" 

(DCO)file. 

The DCO file is the second core feature described herem, and it contains, 
in essence, all of the information necessary to resurrect or reconstruct the 
document collection hierarchy, which takes on the appearance of an organizational 
25 "tree" 300, as illustrated in FIG. 3. For example, the DCO file contains the 

information necessary to establish that folder F, contains two nested folders F2 and 
F3. In addition, this exemplary DCO file contains the information necessary to 
establish that there are a number of documents Dj, Dj, D3 and D4 direcdy 

-7. 
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associated with folder F|; that there are two documents and directly 
associated with folder Fj; that there are two documents and as well as a 
nested folder F4 associated with folder F3; and that folder F4 contains a document 
D4 and a clipped document S. 
5 In accordance with another aspect of the present invention, each user, in a 

multiple-user environment, has the abiliQr to create a user profile for a local 
terminal or workstation. The user profile, in essence, defines a "local" version of 
the primary document collection and the document collection hierarchy, whidi are, 
m turn, defined by the various STG files and the DCO file respectively, as 

10 described above. The user profile may define the local document collection such 
that it mcludes all of, or a portion of, the documents in the primary document 
collection. The user profile may also define the local document collection such 
that it reflects a different document collection hierarchy than the one defined by the 
DCO file for the primary document collection. 

15 This is accomplished, in part, by maintaming a local STG file for each 

document in the local document collection. In addition, a local version of the 
DCO file is maintained, which defines the hierarchy of the documents in the local 
document collection. Although a user can, of course, alter the content of an 
existing document in the primary document collection by manq)ulating the 

20 document locally, and hence, the content of the STTG file associated with that 

document,..fhe user profile cannot alter the document collection hierarchy defined 

-H>yHhe-DG0-file-forHheT)riniary-dbcument-coBection: 

VIRTUAL DOCUMENT STORAGE 

The present invention also employs a virtual document storage scheme. 
25 Virtual document storage is the third core feature described herein. 

FIG. 3 illusUrates the concept of this virtual document storage feature. It 
will be recognized that document D4 appears in several folders within the 
organizational hierarchy 300. First, it is associated with folder F|. Next, it is 

-8- 
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associated with folder Fj. Finally, it is associated with folder F4. However, in 
accordance with a preferred embodiment of the present invention, this does not 
mean that three copies of document D4 are stored in the DCO file. On the 
contrary, the content of docimient D4, as with each and every document in the 
5 document collection, is stored in its entirety in but one memory location, and the 
DCO file links folders F„ Fj and F4 to the one copy of document D4 by providing 
a pointer from each folder F„ F^ and F4 to the STG file 310 associated with the 
document D4. 

The virtual document storage feature saves memory space, it simplifies the 
10 taslc of updating files, and it guarantees document mtegrity by maintainiTig but one 
version of a given document, as one ^ed in the art will leadily understand. For 
exanq>le, if a user modifies an existing document, such as document D4, the 
modifications are reflected in the affected data fields in the corresponding STG file 
310. Consequently, these modifications are reflected whenever the user, at a later 
15 time, accesses the document D4 through folder Fj, Fj or F4. 

INDEXING AND RETRIEVING ^ 

'■^ 

In addition to the core features ISO described above, the presen^invention 
enq)loys a munber of document management utilities. The first of these utilities is 
the indexing and retrievu^ utility 157, the focus of which is an index smd retrieval 

^ enginer-^e4ndex-and-retrieval-engineramong-other^^ an ind amg 

database comprising an index or list of each document m the document collection 
and a cross-reference between each document m the document collection and 
various key terms and/or document attributes that are stored for each document in 
the corresponding STG file. The indexing database, in turn, is primarily used to 

25 support the document search function, which is described in greater detail below. 
Briefly, however, the present invention en^)loys a search engine which has the 
ability to compare the information in the indexing database with one or more user- 
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supplied search terms or attributes. Documents whose indexing information match 
the user-supplied search terms or attributes are then identified and/or retrieved. 

The index and retrieval engine also continuously updates tb& indexing 
database. For example, when a new document becomes part of the document 
5 collection, an STG file is created for that document, as explained above. In a 
preferred embodiment, the index and retrieval engine also creates a new entry in 
the indexing database for the new document, cross-referenced with key terms and 
other attributes extracted from the new document's STG file. 

In addition, the ind^ and retrieval engine continuously monitors the 
10 contents of existing STG files. If a document is modified, and if the modification 
is reflected m the corresponding STG file, the index and retrieval engme updates 
the indexing database accordingly. 

Another related utility is the Universal Resource Locator (URL) indexing 
module. Essentially, a URL is a World Wide Web site that furnishes information 
15 regarding the location and, in some cases, the content of particular Web sites 

and/or Web documents. The URL indexing module provides the ability to index 
this information so that a user can more effectively access a Web site or retrieve a 
particular web document as if it were any other document stored in the document 
collection. . 

20 V In the present invention, there are three exemplary embodiments for 

impleis^iting the URL inducing module. The first exemplary embodiment 

involves-auto^ndMing-'4)oolanarksV^4wol^^ 

commonly used web sites. In accordance with this embodiment, an STG file is 
created for each bookmark. The second exen^lary embodiment involves 

25 physically copying a URL into memory, and indexing information relating to that 
URL. In this embodiment, an STG file is created for the URL. The third 
exemplary embodiment involves viewing a particular document located at or 
identified by a particular URL. Again, information relating to this document may 
be indexed as with any document in the document collection. Moreover, an STG 
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file is created for tbe document, and the document can be imported through die 
Browser utility, which is described below. 

CATEGORIES AND CATEGORIZATION OF DOCUMENTS 

The present invention also employs a categorization utility 159 that 

5 provides different levels of automated assistance in organizing the document 
collection. A category is a logical groiqping of documents that share some 
common attribute or attributes, sometnnes referred to as category criteria. For 
exanq>le, a category may consist of a number of documents that share a conunon 
author, a number of documents that contain at least a predefined number of woids, 

10 a number of documents that contain certain key words, or a number of documents 
that share a common concept. A more specific example might be a category called 
"con^any press releases" or a category called "all e-mails Fve sent out". 
Categories can also be defined hierarchically. In other words, a category may 
have a subcategory. For example, "all e-mails Tve sent out to my group" might 

15 be a subcategory of "all e-mails I've sent^out" . 

The categorization utiliQr 159 inq>l&nents a category by associating a 
corresponding set of category criteria with a folder in the document collection 
hierarchy; however, it will be recognized that not eveiy folder in the document 
collection hieraidiy is associated with a calory. For example, in FIG. 3, folder 

^0 Ff4s-assQciated^tti-a-categQry-as4ndicatedH>y-^ 

Fi, F3 and F4 are not associated with a category. 

Folders that are associated with a category are, in general, referred to 
herein as "smart" folders. They are referred to as smart folders because the 
categorization utility 159 continuously searches through the STG file directory, or 

25 a portion thereof, for docimients that match the category criteria associated with 
each smart folder. If a match is identified, the categorization utility 159 generates 
a link between the smart folder and the matching document, through the matching 
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document's STG file, thus creating the appearance that smart folders automatically 
collect matchuig documents without user interaction. 

As stated, the categorization utility 159 may only search a portion of the 
STG directory for documents that match the category criteria of a given smart 
5 folder. In an exenq)lary embodiment of the present invention, the categorization 
utility 159 limits its search of the STG directory to only those STG files associated 
with documents that are linked to the smart folder's parent folder. For exan5)le, in 
FIG. 3, F2 is a smart folder. In accordance with this exemplary onbodiment, the 
categorization utility 159 seardies the STG files associated wifli Fi, wherein F| is 
10 the parent folder of Fj. Accordingly, the categorization utility 159 only searches 
through the STG files associated with the documents Di,D2,D3 and D4. At 
present, only the documents D3 and D4 match the category criteria associated with 
the smart folder Fj. 

Generating a link between a document and a smart folder may occur after 
15 an STG file is created for a new document, or it may occur after an existing STG 
file has been updated due to the modification of its corresponding document, 
:a wherein the modification caused the document to meet the category criteria of the 

^ . smart folder. Sunilarly, if a document is modified such that the modificatioii 

r:^ causes the document to no longer meet the category criteria of a particular smart 

> 20 folder, the link between that document's STG file and the smart folder may^be 

eliminated. _ . 

fii-accordancc^ith-aT)referred-CTibodiment-ofiheT)^^ 

categorization utility 159 categorizes documents under various smart folders using 
one of three possible categorization methods: auto categorization; semi-automatic 
25 categorization; or manual categorization. 

With manual categorization, a document, through its corresponding STG 
file, is linked with a particular category, hence a particular folder, when a user 
physically "drags and drops" a display screen representation of the document onto 
a display screen representation of the folder. As the categorization utiliQr 159 did 
30 not previously nor automatically categorize the document with this folder, the 

-12- 
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folder is either not a smart folder or it is an inactive smart folder, or the folder is 
an active smart folder, but the document does not otherwise match the 
corresponding category criteria. Active versus mactive smart folders are explained 
in more detail below. 

5 Semi-automatic categorization involves categorizing a document into any 

one or more categories with minimal user interaction. Here, the user constructs 
category criteria in the form of a query. The query, in turn, comprises one or 
more key terms and/or document attributes which define the category. The user 
may also restria the scope of the query, for exan^le, to particular dh-ectories or 

10 document types. The category criteria are then associated with a fdlder, i.e., a 
smart folder, and the categorization utility 159 continuously searches through all 
or a portion of the STG files for documents that have attributes matching the 
category criteria. If a matching document is identified, a new link is established 
between the corresponding smart folder and the matching document through the 

15 document's STG file. 

Automatic categorization involves categorizing a document into one or 
more categories without any user interaction. Here, each category is represented 
by a smarHolder that initially contains a "seed" document. The "seed" document 
is then analyzed by the categorization utiliQr 159, and the category criteria (i.e., 

20 the key words and/or attributes) are automatically extracted. Existmg documents 
and new documents ttiat matdi the automatically extracted category criteria are 

The categorization utility 159 can utilize the indexmg information to 
examine the relationship between the various documents within a particular 
25 category. This feature scores or ranks the relationships. For example, documents 
that share a large number of key terms are considered closely related; those that do 
not share a large number of key terms are considered less related. The 
categorization utility 159 can display the results in the form of an organization 
hierajcchy "tree" . Branching high in tiie organizational hierarchy denotes a close 
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relationship, while branchmg low in the hierarchy denotes a more distant 
relationship. 

In a preferred embodiment, a user can modify the category criteria 
associated with a smart folder. This is accomplished through a "modify category 
5 criteria" user interface. Changing category criteria may result in the categorization 
utility 159 purging documents from the corresponding category if the documents 
no longer match the category criteria. In addition, the categorization utility 159 
initiates a search using the modified category criteria, to identify additional 
documents in the document collection that are now relevant given the new category 
10 criteria. 

In a preferred embodiment of the present invention, a user may designate a 
smart folder as active or non-active. For each active smart folder, the 
categorization utility 159 contmuously searches the STG file dkectory, as 
described above, for documents that match the category criteria associated with 

15 each of the smart folders. For inactive smart folders, the categorization utility 159 
does not continuously search the STG file directory for documents that match the 
category criteria of the various inactive smast folders; however, a user is able 
manually categorize a document with a non^lK^tive smart folder. Active smart 
folders are sometimes referred to as "hungry^" folders. 

20 Smart folders can also be reactive. In accordance with a prcfened 

embodiment of die present invention, a user^can program a smart folder with 

: — particulartehaviorai^diaracteristicsT-sucfarl faat a particul artadcortadgigg 

automatically performed on or with the doomients Unked with that smart folder. 
For example, the user may program a smart folder to automatically e-mail all 

25 documents stored tiiercin to a partioilar e-mail address. In another example, the 
user may program a smart folder to periodically display folder updates, such as the 
addition or deletion of new documents. 

With respect to semi-automatic and automatic categorization, tiiere are two 
filter types associated with each smart folder. The first filter type generates an 

30 inclusion list. The inclusion list identifies tiiose documents that werc not 
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automatically included in the category associated with tbe smart folder durmg the 
categorization process. The inclusion list may provide the user with an indication 
that the category criteria associated with that category are too restrictive. The 
second filter type generates an exclusion list. The exclusion list identifies those 

S documents that were not automatically excluded from the category associated with 
the smart folder during the categorization process. The exclusion list may provide 
the user with an indication that the category criteria associated with that category 
are not restrictive enough (i.e. , the category criteria is too aggressive). Both lists 
are manually manipulated by the user. Accordingly, the user can modify tfie two 

10 lists as needed. 

As e^lained above, the data defining the links that are established between 
the various smart folders and the documents in tbe documem collection are 
maintained in die DCO file. If a user modifies the contents of a document and that 
modification causes a diange m the link or links between that document, through 

IS its corresponding STG file, and one or more smart folders, a change notification 
utility (not shown in FIG. IB) updates the DCO file to reflect the changes : 
accordingly. More specifically, the change notification utility modifies the DCO 
file to reflect the newly created links and/or the deletion of links. The change-' 
notification utili^ also updates the thumbnail representations if needed. And^the 

20 user deletes a document in its entirety, ib& chaise notification utility deletes all of 
the links associated with that document from the DCO file. . 

— : ISierUserrinterfeuse-for^'diangeniiotificati^ 

This user interface allows the user to select a few preferences with respect to tbt 
change notification utility. More particularly, the user can select how often the 

25 utility is to perform the updates, as well as the type of file changes that triggers an 
update notification. 
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IMPORTING DOCUMENTS 

The present invention also employs a document importing utility 161. The 
document importing utility permits the present invention to inq)ort electronic 
documents or electronic representation of paper-based documents from various 
5 sources. For example, the document importing utility 161, as shown in FIG. lA, 
can in^rt documents from a scanner, from an external memory such as a hard 
drive, from a LAN, or from the internet* 

The first feature associated with the document impdrtiqg utility 161 is tiie 
scanner module. The scanner module controls a scanner connected to the host 
10 computer system. More specifically, the scanner module allows the user to set-up 
the scanner. It also controls the scanning process and the process of saving Ae 
electronic representation of Ifae document being scanned. 

In a preferred embodnnertt of the present invention, there are a number of 
user interfaces associated with the scanner module. The first user interface is the 
15 scanner preferences interface, and there are four display options associated with 
the scanner preferences inter&ce. The first display option allows the user to 
define various' scanner options, as illustrated in FIG. S. The second display option 
is for defining image file options, as illustrated in FIG. 6. The third display is for 
setting-iq) scan-to-category options, as iUustrated in FIG. 7. This option permits 
20 tfie user to scan a document dkectly into a desked catejgoiy. The fourth is for 

defimng^ptions^di-req)ect^o-multiplei)age-docunientsra 

With particular regard to the multiple page option interface illustrated in 
FIG. 8, this set of user-defined options is for controlling the scanner's automatic 
document feeder (ADF). Of course, this particular scanner preference option is 
25 enabled only if the scanner has an ADF. If the user selects the "Check ADF 

continuously" box 805, the scanner is polled at a predefined interval to determine 
whether there is paper in the ADF waiting to be scanned. If there is paper in the 
ADF, it is scanned, and the document is saved according to the other above- 
identified scanner jpreference options. If there is more than one page bei^g 
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scanned, there are a nuniber of additional options as illustrated in FIG. 8. If the 
user selects the '*proni{)t for more pages at the end of the scan** box 810, the user 
will be prompted to append additional pages to the document at the end of the 
current scanning operation. 

5 The second user interface associated with the scanner module is the scanner 

control interface. The scanner control interface is illustrated in FIG. 9. When the 
scan button 90S in the center of the scanner control interface is selected, the 
scanner module begins scanning a document in accordance with the scanner 
preference options described above. If the user selects the scanner control 

10 inter&ce title bar 910, the scanner preferences interface described above is be 
displayed, thus allowing the user to accept or change the current scanner 
preference options. 

As previously stated, an STG file is created for each new document in the 
document collection. In addition, the mdex and retrieval engine indexes each new 

15 document based on the attribute data in the correspondmg STG file, and the 
categorization utility 159 links each new document with the appropriate smart 
folders. Each of these features holds true for new documents that have been 
scanned into the document collection as well as Jliose which have entered the 
document collection via other mechanisms. Th^ saves the user from having to 

20 physically interact with a particular document qf documents after they have been 
scanned. . 

— ^e-second-feature-associated-with-the^ocument4nqportu^^ 

-the file isnpon module. The file unport module is responsible for extracting and 
saving attribute information in the STG file of a newly imported document. The 

25 attribute information extracted from each document by the file import module 

depends^ to some extent, upon the file type. With regard to word processing type 
documents, the file import module extracts and saves the raw text information. 
The file import module also extracts a 96 x 96 pixel map for generating a 
thumbnail image of the first page of each document. Thumbnails contain actual 

30 document information, and are primarily used in conjunction with the Browser 
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Utility to help a user quickly identify specific files. FIG. 10 illustrates an 
exemplary display fi-om the Browser utility which contains a nmnber of thumbnail 
representations 1005 for the category entitled "My Documents" as indicated in text 
box 1010. For image files, the file import module extracts a thumbnail map and 

5 any meta-text associated with the image file. Meta-text contains the content and 
position information for any alpha*numeric information appearing in the image. 
The file in^ort module then converts the alpha-numeric information into plain 
text. The plain text can then be used by the index and retrieval engine as 
described above. Therefore, image files can be ind«ed and categorized just like 

10 word processmg and other text files. - 

When a color unage containing text is to be scanned, the user can specify 
that the color image is to be scanned using a two-pass scanmng process. The first 
pass is a low resolution scan which converts fbe document into a desired image 
format, e.g. , TEPF, JPEG etc. . . . The second pass is a higher resolution pass that 

IS is conducted on a non-color or non-gray scale version of the image. This second 
scanning pass is used to obtain the position of the meta-text described above. : 
The third feature associated with the document inq>orting utility 161 is the 
failed import recovery feature. If , upon importing a document into the document 
collection, the file unport module is unable to determine the document format, the.^ 

20 user is pronq)ted to define the format. > 

BROWSING^GIfl^lEmS ^ 

The present invention includes a document browsing utility 163. The 
Browser utility 163 permits the user to quickly and efficiently review the document 
collection or a portion thereof. Moreover, it allows the user to view the 
25 documents and the docimient categories as they are logically arranged in the 

organizational tree described above. In addition, the Browser utility 163 permits 
the user to manipulate documents and dociunent categories; to copy, move and 
delete documents and document categories; to view and print documents; and to 
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bundle multiple documents into a compound document entity referred to as a 
clipped document. Clipped documents are described in greater detail below. 

There are two basic user interfaces associated with the Browser utility 163. 
The fmi is referred to as My Computer, as ilhistrated in FIG. 10. When viewing 

5 documents and document categories with the My Computer interface, there is one 
display panel lOlS for displaying a representation of each document, clipped 
document or folder. In FIG. 10, the document representations are displayed as 
thumbnails, although other representations are available such as small or large 
icons. The second user interface is referred to as the Explorer mter&ce, as 

10 illustrated m FIG. 11. In contrast, E9q)lorer has two display panels: a right display 
panel 1105 and a left display panel 1110. While the left panel 1110 displays tiie 
folders and/or document categories, mcluding the one currentiy opened, the right 
panel llOS displays a representation of each document, clq)ped document and/or 
folder associated with the currentiy opened folder or document category. Again, 

IS the representations appearing in the right panel 1105 can take the form of 

thumbnails, small icons, or large icons. In FIG. 11, the representations are in the 
form of small icons. 

The Browser utility 163 allows the user to interact with the documents in 
the document coU^on m a numbo: of different ways. Using a mouse or cursor, 

20 the user can open.docuinents in a correspondii]^ host application. The user can 
open a category and display the documents, clipped docum^its, folders and/or 

subeategories-associated^rcwithr-T?ieiiserxan-open-a^ 

given document, as shown in HG. 11, wherein die context menu 1115 provides 
the user with a number of additional options as illustrated. 

25 These two usct interfaces associated with the Browser utiliQr 163, My 

Computer, as illustrated in FIG. 10, and Explorer, as illustrated m FIG. 11, each 
have a number of standard pull-down menus. The FILE pull-down menu, for 
example, allows the user to, among other options, open documents, clipped 
documents, or document categories; send documents or clipped documents as e- 

30 mail messages; create new categories; inq)ort new documents from the scanner; 
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delete, rename and/or list the properties of documents, clipped documents and 
categories. The EDIT menu allows the user to copy, paste, select all or part of a 
document. This menu also allows the user to clip or unclip documents. The 
VIEW menu, among other options, allows the user to display a customizable 

S application toolbar, to be described below, and to control the arrangement and 
display representation of each document. In addition, there is also a TOOLS, 
TEST and a HELP menu. 

In order to import a document into the system's document collection from 
the Browser utility 163, a user can exercise one of several options. For exanq>le, 

10 a user can drag a document from the conqniler operatmg system desktop 

environment and drop it onto a Browser utility icon also appearing on the desktop. 
If one of the two above-identified Browser utility interfaces is active, the user can 
drag a document from the desktop envuronment and drop it into one of the 
aforementioned panels in a location Aat is unoociq>i6d by another icon or 

IS thumbnail. As described above, the categorization utility 159 automatically 

categorizes these documents based on the document; attributes extracted and then 
stored in their corresponding STG files. The user can also drag a document from 
the desktop environment into a particular category representation appearing in the 
Browser interface, thus, manually categorizmg the dpcument. Finally, the user 

20 can cut and paste all or part of a document, or tiie user can scan in a document. 

The user can also initiate a scamung operaticmrfixim the Browser utility 

l€3T-A-Tepresentation-of-tiie-scamied4n)age^can-bee(Uq^ 

scanned directiy into one or more categories based on the user specified scanner 
options described above. This too is handled by a task manager utility 165 which 

25 is described in greater detail below. 

The user may also opt to display the customizable ai^lication toolbar, 
mentioned above, with either of the two Browser interfaces. An exenq)lary toolbar 
1205 is illustrated in FIG. 12. The toolbar 1205 makes it easier for the user to 
directiy interact with documents maintained in the document collection. For 

30 example, by employing the toolbar 1205, the user is able to drag and drop 
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application program icons or buttons (i.e., buttons or icons which» if selected, 
launch an application program such as Microsoft Excel, Microsoft Word, 
Netscape, or WordPerfect), thus allowing ttie user to quickly open one or more 
application programs and to convert, view and/or edit documents on-the-fly. This 
5 on-the-fly document conversion is accomplished by enq)loying conversion filters to 
convert the various file formats. As stated, the user is able to quickly execute 
other functions with the customizable application toolbar, such as send e-mail, 
transmit facsimiles, and initiate print jobs. 

The Browser utility 163 is also capable of displayu^ a representation for 
10 one or more transitional documents. A transitional document is a document that is 
currently being processed by the inqportmg utility 161. During the period m which 
a document is being processed by the hxqportmg utility 161, the Browser utility 163 
displays an "in-transition" icon for that document, for example, the in-4ransition 
icons 1305 shown in FIG. 13. However, an m-transition icon is a temporary 
15 representation. When the inq)orting utility 161 finishes processmg the document, 
the Browser utility 163 automatically replaces the in-transition icon with the 
appropriate thumbnail rq>resentation, a small icon or a large icon, depending upon ^ ' 
■ the current Browser utility display settings described above. In-transition icons '^'^ 

^ provide the user with an easily recognizable representation for each of the one or ^ 

V 20 more transitional documents. 

SEARCTNGtDOCIME^ITS — : — : 

The present invention also uicludes a document searching utility 167. The 
searching utility 167, m turn, enq)loys a search engine that globally searches the 
document collection (i.e., the STG files in the STG file directory) and retrieves 
25 documents that fit or match a number of user-defined conditions with respect to 
text, meta-text, and/or other file attributes (e.g., document author, date, size, 
format). 
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. In a preferred embodiment of the present invention, there are two search 
types which the user can initiate: a basic search and an advanced search. The 
basic search allows the user to search the document collection using a search query 
that contains only words or phrases. The advanced search allows the user to build 
5 a query that contains words and/or phrases as well as other file attributes, and it 
allows the user to combine the various words, phrases and other file attributes with 
boolean operators. 

Whether the user invokes a basic search or an advanced search, the 
searching procedure is essentially the same. The user enters a desired query, then 
10 selects a FIND NOW option m the correspondiiig user interface, which is — 
described in greater detail below. The results are then displayed. The user Aen 
selects one or more of the identified documents, if desired. 

As stated, there is a user inter&ce for the basic search and a user interface 
for the advanced search. The user inler&ce for the basic search is illustrated in 
15 FIG. 14. As shown in FIG. 14, Ae user interfiace is divided mto an upper portion 
140S, which is reserved for building search queries, and a lower portion 1410, 
where the results of a given search are displayed. The lower portion 1410 is 
referred to as the results listbox. 

After the user builds a basic search query, the user selects the FIND NOW 
20 option 1415 on the user interfiace to initiate the search. The search engine then 

performs the search for^tfaat query. As the indexing engroe finds a document that 
matdies-the-search-aiteriardefined-bv 



informs the search utility 167. The search utility 167, in turn, displays the name 
of the document in the results listbox 1410 of the basic search user interface. At 

25 any time during the search, the user can select the STOP option 1420 on the user 
interface, which forces the indexing engine to terminate tiie search. 

With regard to the results listbox 1410, the user can view the identified 
documents as small icons, large icons, thumbnails, or as a detailed list of 
documents. In an exemplary embodiment, the search utility 167 creates one or 

30 more smart folders and displays them in the results listbox 1410. Each of the one 
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or more smart folders has category criteria associated with a particulsu: level of 
relevance (e.g., a number of search hits). Documents identified during the search 
are linked to one of these smart folders dq)ending upon the actual relevancy of the 
document. For example, the search utility 167 may create two smart folders. The 
5 first smart folder's category criteria may be documents identified during the search 
operation havmg 10 or more search hits. In contrast, the second smart folder's 
category criteria may be documents identified during the search having less than 
10 search hits. The search utility 167 then luiks the documents identified during 
the search to eiflier the first or the second smart folder accordingly. The smart 

10 folders, along^th the documents linked thereto are then displayed in the results 
listbox 1410. In another exanople, the search operation may create a number of 
smart folders which are displayed in die listbox 1410, wherein each smart folder 
may be linked to a group of documents that share a certain number of key search 
terms. Alternatively, each smart folder may be linked to a group of documents 

15 containmg key search terms that exhibit a certain semantic similarity. 

In accordance with another exemplary embodiment, the search operation 
may identify one or more existing categories whose category criteria, in whole or 
in part, overlaps the key search query criteria. The seai^ results might then be 
organized such that the one or more existing categories ate listed. The user could 

20 then view those documents associated with each category>tfaat meet the search 
query criteria. 

^?%e-user-can-also-select"aiiy*nuniberof^ 

selea the SIMII^ DOCS 1425 option on the basic search user mierf^^ The 
search utility 167 then queries die indexing engine to identify all documents similar 

25 to those selected. For example, the indexing engine might identify all documents 
that are sunilarly categorized. The newly identified documents are then displayed 
in the results listbox 1410. 

The advanced search user interface is accesised through the basic search 
user interface by selecting the ADVANCED option 1430. The advanced search 

30 user interface is illustrated in HG. 15. As stated above, the primary difference 
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between a basic search and an advanced search is that with an advanced search, the 
user can conduct more sophisticated searches with words, phrases, file attributes 
and/or a combination thereof using boolean operators. A file attribute refers to 
any number of file characteristics, for example, document size, publication date, 
5 author, or document source (i.e. , files with a particular extension such as *.TIF, 
♦.TXT, *.HTM). 

Although the advanced search user mterface, like the basic search user 
interface, includes an upper portion 1505 for building search queries, and a lower 
portion 1510 for displaying search results, the advanced search user interface also 
10 includes a number of additional options not available for basic searching. For 
exanq)le, the user can modify the scope of an advanced search by entering a 
specific category in the SCOPE EDIT BOX 1515. By selecting the BROWSE 
option 1520, a category tree is displayed, which allows the user to select, 
therefirom, a category for Ihnitmg the scope of the advanced search. Accordingly, 
15 the selected category is displayed m die SCOPE EDIT BOX 1515. The user can 
also limit the scope of an advanced search to the contents of each document, 
excluding document annotations; or the user can include the annotations; or the 
- xiser can limit the search to only document annotations. This is acconq)lished by 
selecting the box 1525 entitled "Include Documents" and/or the box 1530 entitled 
20 "Include Annotations". Fmally, the user can return to the basic search user 

interface by selecting the BASIC option 1535. If the user selects this option, aU 

seanA-conditions-are4ost:except-fcosexontaining-excliisivd y words and /or 

phrases. ^. 

VIEWING DOCUMENTS 

25 The next utility employed by the present invention is the document viewing 

utility 169. The document viewing utUity 169 allows the user to view an entire 
document regardless of document type or document format, even if the 
corresponding host application cannot be launched. 
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The document viewing utility 169 user interface, as illustrated in FIG. 16, 
is accessed through the Browser utility 163. The document viewing user interface 
conq>rises two panes, a right pane 1605 and a left pane 1610, as illustra^ in FIG. 
16. The left pane 1610 displays an icon or thumbnail of the document that is being 

S viewed in the right pane 1605. In a preferred embodiment, a thumbnail 

representation is used if the document is an image. If, instead of a document, a 
clipped document is being viewed, then the icons or thiunbnails for each individual 
document associated with the clipped document is displayed in the left pane 1610. 
The document viewipg user interfiace, like the browser user interfaces, 

10 includes a customizable implication toolbar 161S as illustrated in FIG. 16. Again; 
the toolbar 1615 is customizable in that the user can drag and drop fimctional 
buttons into the toolbar 1615 as described above, particularly buttons that, when 
selected, launch an application program which the user may need to properly view 
die documents. In addition, the toolbar 1615 may contain a number of functional 

15 buttons. In FIG. 16, the toolbar 1615 includes, from left to right, buttons for 
opening, saving, printing, hand scrolling, annotating, zooming, and advancing 
forward or back one page of the document being viewed. 

The document viewing utility 169 also highlights category criteria. In 
other words, it highligjits the various key words, phrases, and/or attributes in the 

20 document being viewed, whidi make up the category criteria, assuming, of course, 
the document has been categorized. 



CLIPPED DOCUMENTS 

The present invention also utilizes a document clipping utility 171 . This 
utility allows a user to combine several documents into a compound document 
25 entity herein referred to as a clipped document. More specifically, a clipped 

document is a form of compound document that contains zero or more documents 
of any type or format. For example, a cl^)ped document may contain an image 
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document, a Microsoft Word document, a WordPerfect document and a web page 
in HTML format. 

Clipped documents are different from ordinary file folders. First, clipped 
documents maintain the order in which each component document appears. In 

5 other words, each of the component documents that are associated with a clipped 
document maintains a relative position within the clipped document with respect to 
the other conq)onent documents. Second, clipped documents provide the user with 
the ability to quickly and simply manipulate a set of related documents as a group. 
For example, a user can e-mail a clqqped document to another user, and the other 

10 user actually receives the documents as a clqiped document. If the host computer 
being operated by the other user is not executing the present invention, the other 
user receives each of the documents individually. 

Although the user can mampulate the component documents as a group, 
there are other instances when the component documents associated with a clipped 

IS document are manipulated individually. For example, the search engine, in 

performing a basic or advanced search, identifies each component document within 
a clipped document, assuming they meet the search criteria, including the level of 
relevance of each individual component document. ^i? 

As explained above, the present invention eo^loys a^irtual document 

20 storage feature. Accordingly, clipped documents do not physically contain a copy 
of each coiqponent document. Rather, each clipped documentrhas a corresponding 

— \ — SmiilerasTdKcribed-aboveT^-asiUustratedin^ 

associated with a clq>ped document contains a Imk to the STG file of each 
conqx>nent document (see FIG. 2A). Once again, this vutual document storage 

25 feature saves valuable memory space and it helps maintain document integrity 
(i.e., a single, up-to-date version of each document). 

Just as individual documents can belong to more than one category, clipped 
documents can belong to more than one category. A clipped document can also 
belong to no categories. 
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In a preferred embodiment* there are six ways in which a user can create a 
clipped document. First, from the Browser utility 163, a user can drag the 
representation of a source document D| and drop it onto the representation of a 
destination document Dj, as illustrated in FIGs. 17A-17D. The Browser utility 

5 163 creates a new clipped document S in the category containing the destination 
document D2. A representation of the new clipped document S then appears to 
subsume the representation of the destination document Dj. At the same time, the 
source document D| remains in the source category or be removed from the source 
category dependmg upon whether the user executes a copy operation or move 

10 operation. — * 

Second, from the Browser utility 163, a us^ can drag the repres^itation of 
an existing clipped document and drop it onto a destmation document. The 
Browser utility 163 causes the destination document to become concatenated with 
tiie clipped document, which m turn appears to subsume the representation of the 

15 destination document. A representation of a new clipped document, once again, 
appears in the category contammg the destination document. Also, the existing 
clipped document remains in or is removed from the source category depending 
;y upon whether the user executes a copy operation or a move operation. 

- TUrd, from the Browser utility 163, a user can drag the representation of a 

20^ source document and drop it onto the rq>resentation of an existing clipped 

document in a destmation category. Here, the source document is appended to the 

:-iexisting^H)ed-document-in"the-destinationxategoryr^^ document 

remams in or is removed from the source category dependmg upon wheth^ the 
user executes a copy operation or a move operation. 

25 Fourth, from the Browser utility 163, a user can drag the representation of 

a clipped document from a source category and drop it onto a representation of a 
clipped document in a destination category. Accordingly, the component 
documents associated with the source clipped document are appended to the 
destination clipped document. The source clipped document r^nains in or is 
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removed from the source category depending upon whether the user executes a 
copy operation or a move operation. 

Fifth, from the Browser utility 163, a user can sinq)ly create a new clipped 
document. The user can then designate that the clipped document is to be 
5 associated with a particular category. 

Sixth, from the document viewing utility 169, a user can drag the 
representation of a document in a source category or the representation of a 
clipped document in a source category and drop it onto the representation of the 
document beii^ viewed. The document or documents associated with the clipped 
10 document are appended to the document being viewed, thus creatiiig a new clqiped 
document in the category containing the document being viewed. Once again, the 
source document or source clipped document remains in or is removed from the 
source category depending iq>on whether the user executes a copy operation or a 
move operation. 

IS A user can also unclip a clipped document. Upon executing an unclipping 

operation, the representation of the clipped document is removed and the 
representations of the component docimients are made visible in the corresponding 
Browser user interface. Additionally, the user can delete a clipped document, 
either locally or globally. From within a particular category, the user merely 

20 executes a delete clipped document conmiand, wherein the Browser utility 163 
deletes the clqiped documrat, along with the component documents, from that 

-xategoiyr^rom-^e'^y~Documents''-category, a user can execute a ddete clipped 

document command, wherein the Browser utility 163 deletes the clipped document 
from every category. 

25 Odier software applications, such as Microsoft Office, employ compound 

document entities; however, these entities differ from clipped documents. For 
exan^le, Microsoft Office uses "binders". Unlike clipped documents, binders 
only allow a user to mix Microsoft Excel, Powerpoint and Word documents. But 
binders do not allow the user to bind non-Microsoft Office formatted documents. 

30 Another major difference between binders and clipped documents is that binders 
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are monolithic documents. In other words, the component documents cannot be 
individually manipulated. In addition, a user of Microsoft Office cannot e-mail a 
binder to another person unless the other person is running Microsoft Office. Yet 
another difference between Microsoft Office binders and clipped documents is the 
5 fact that binders maintain a physical copy of each component document whereas 
the present invention employs a virtual document storage feature. As explained 
above, this is an inefficient usage of memory space, and it can lead to multiple 
versions of a single document, since a single document may be stored in more than 
one bind^. 

10 FILE HELPER 

A next utility is the file helper or archiving utility 173. The file helper 
utility keeps the document collection tidy. More specifically, the file helper utility 
automatically archives files onto removable media, if, in general, those files have 
not been accessed or modified for a long period of time. The file helper utility 

IS also notifies the user if files have old dates; it notifies the indexing engine and the 
DCO file when files are taken off-line; it monitors the document -Sbllection for 
document diqplicates; and it organizes a separate index of off-lin^ocumeots. 

The file helper utility has a user interface, as illustrated in FIG. 18. As one 
skilled in the art will readily appreciate, the user inter&ce illustn^ in FIG. 18 

^ pennits^eiisertorsele ct one or more var ious-TOnd itions t hat triggerQie automatic 

archiving process. The file helper utility also prompts a user, if the user so 
desires, before the system archives a document in accordance with the user 
selected options. In addition, there are a number of secondary user interfaces 
associated with the file helper utility 173, for example, the user interface 

25 illustrated in FIG. 19. The secondary user interfaces are utilized for entering more 
specific archiving conditions, such as the exact size of a document or the age of a 
document that trigger this archiving utility 173. 
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The file helper utility continues to mamtain a link to or an index of each 
archived document^ by storing a thumbnail representation of each archived 
document and/or the STG file associated with each archived document. Therefore, 
if a user wishes to run a search involving archived documents, the search engine is 

S capable of searching the content of the thumbnail representations and/or the STG 
file data fields of each archived document. If the search identifies one or more 
archived documents, the file helper utility prompts the user to make the 
appropriate removable storage medium available (e.g., prompt the user to insert a 
particular floppy disk) in the event the user wishes to access the archived 

10 document. 

DIRECTORY MONITOR 

There is also a directory monitor utility 175 that monitors specific user- 
identified directories, categories, and/or folders on a particular storage device for 
newly stored documents. When the directory monitor utility 175 identifies newly ; 
15 ^ored documents, the categorization utility 159 automatically categorizes these ^ 
documents into the appropriate categories or smart folders, as described above. 
Again, there is a user interface associated with the directpiy monitor utility 175 as 
illustrate in FIG. 20. As shown, the user interface provides the user with a vehicle ^ 
io^lea the particular directories to be monitored. 



20 TASK MANAGER 

The task manager utility 165 is yet another utility employed by the present 
invention. The task manager utility 165 is a multi-threaded single instance utility 
that is launched when the host computer is booted after loading the software 
associated with the present invention or upon a first request for one of its services 
25 after the utility has been turned off. Its main ftuiction, however, is to facilitate 
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background or batch processing jobs such as importing documents into the 
document collection. 

The task manager utility 165 has a corresponding user interface, as 
illustrated in FIG. 21, The user interface includes a queue for displaying a list 
5 2105 of the various tasks currently being undertaken by the task manager utility 
165. 

When the task manager utility 165 is first initiated, for example, if the user 
executes a document import request, a small icon 2210 appears in the system task 
bar at the bottom of the display, as illustrated in FIG. 22. If the user selects the 

10 icon (with a mouse/cursor), the task manager utility 165 responds by opening the 
task manager utility 165 user interface. 

The task manager utility 165 user interface also uichides a number of "pull- 
down" menus, as illustrated in FIG. 21, including a QUEUE menu and a JOB 
menu. The QUEUE menu includes, among other options, the option of stopping 

15 the task manager utility 165 from scanning or indexing a document, purging the 
queue, and terminating the task manager utility 165. The JOB menu provides 
options that include purging a document from the queue, and changing the priority 
in which the task manager utility 16S executes the jobs in the queue. 

ANNOTATIONS 



-20 Tht p res ent i nv entionrfaas-amran no t ations u t ili t y -177 which provides tte 

. user with the option of adding annotations to a document before a scanning 
operation is completed. This feature allows the user to automatically manipulate 
an image document, including the added annotations, immediately upon completion 
of the scan. The user is also permitted to add annotations of almost any type. For 
25 example, text annotations, free-form annotations (i.e., pictures and graphs), and 
waveform (i.e., audio) annotations. Drag and drop annotations are also available, 
if a user wishes to insert an annotation from one document into another document. 
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The user can even print annotations apart from the remainder of the accompanying document. 

PROPERTY SHEETS 

The present invention also has a property sheet utility 179. The property 
sheet utility 179 allows a user to display a property sheet for each individual 

5 category, clipped document and/or document. Property sheets are yet additional 
user interfaces which convey specific sunmiary information about a given 
category, document and/or clipped document. This summary information may 
include particular document attributes such as document size, date, author, or the 
number of key words or attributes contained in a document. Moreover, the 

10 summary information for a particular document is stored in the STG file 

corresponding to that document. Table I contains a list of potential attributes that 
might appear in a property sheet depending upon whether the property sheet 
pertains to a category, a clipped document or a document. Property sheets might 
also include a brief synopsis or abstract describing the contents of a given 

IS document. In accordance with a preferred embodiment, property sheets can be 
accessed either through the Browser utility 163 or the document viewu^ utility 
169. 
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CATEGORY 


CLIPFED 
DOCUMENT 


DOCUMENT 






•location 


•size 


•image type (with values of color 8-bit gray 






•size 


•name 


scale, 44)it gray scale, or binary) 






•folder members 


•created date 


•doc type (scanner/image/tiff, word 






•name 


•modified date 


processor document, etc.) 






•creation date 


•accessed date 


•widtii dimension (for scanned documents) 






•modified date (modifying 


•author 


•height dimension (for scanned documients) 






either criteria changed or 


•tide 


•size 






the documents it contains 


•last saved by 


•physical storage Jd 






cbaqges) . 


•subject 


•storage media (e.g., the type of media on 






•antfior 




which a document is stored) 






•criteria, threshold score, 




•name 






and exen^)lar document 




•modified date 






•docmnent members 




•accessed date 






•clq>ped document 




•author 






members 
•inclusion list 
•exchision list 
•automatically included 
documents 

•contains^number of 




•key words 






•summaiy 

•tide 

•subject 

•categories O-e., die categories to which a 
document belongs) 






documems? clipped 




•clipped documents (i. e. , die clipped 


' « 




documents, etc.) 




documents to wluch a document belox^) 






•static/dynamic (an inactive 




•query hits 






or hungry category 




•relevance score (i.e., the relevance score 






•up-to-datB'(mea&ing is the 




the document has to the query) 






cat^ty current with 




•revision number 






regards to the documents 




•text information (e.g., number of 






held in the collection) 




characters, etc.) 





Table I 
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The invention has been described with reference to a preferred exemplary 
embodiment. However, it will be readily apparent to those skilled in the art that it is 
possible to embody the invention in forms other than those of the preferred embodiment 
described above. This may be done without departing from the spirit of the invention. 
The preferred embodiment is merely illustrative and should not be considered restrictive 
in any way. The scope of the invention is given by the appended claims, rather than the 
preceding description, and all variations and equivalents which fall within the range of 
the claims are intended to be enibraced therein. 
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WHAT IS CLAIMED IS: 

1 . A method for managing a document collection in a conqniter system, said 
method comprising the steps of: 

5 importing a document having a first format into the computer system; 

storing the document in a memory location; 
automatically extracting attribute data from the document; and 
generating a data structure for the document, wherein said data structure 
contains the attribute data in a second format independent of said first format. 

10 

2. The method of claun 1, wherein said step of importmg a document mto the 
con4)uter system conqirises the steps of: 

optically scanning a paper-based document; and 

converting the optically scanned document into an electronic document. 

15 

3. . The method of claim 2, wherein the first format is an image format. 

4. The method of claim 2, wherein the first format is a text format 

20 5. The method ofclaiml, wherein said step ofinqK)rtmg a document into the 

computer system conq>rises the step of: 
importtng-an^lectronicdocummt; 



6. The method of claun 5, wherein the first format is a text format. 

25 

7. The method of claim 6, wherein the document is a word processing document. 

8. The method of claun 6, wherein the document is an e-mail message. 



30 



9. 



The method of claim 5, wherein the first format is an image format. 
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10. The method of claim 5, wherein the fust fonnat is an HTML format. 

11. The method of claim 1, wherem the second format comprises at least one data 
field. 

5 

12. The method of claim 1 1 , wherein the at least one data field contains a file name. 

13. The method of claim 11, wherein the at least one data field contains the memory 
location. 

10 

14. The method of claim 11, wherein the data field contains a bit map. 

15. The method of claun 11, wherein the data field contains raw text. 

IS 16. The method of claim 1 1 , wherein the data field contains a data attribute. 

17. The method of claim 16, wherein the data attribute is an author name. 

18. The method of claim 16, wherein the data attribute is a publication date. 

20 

19. The method of claim 16, wherein the data attribute is a word count. 

20. The method of claim 16, wherein the data attribute is an annotation. 
2S 21 . The method of claim 16, wherein the data attribute is a key word. 

22. The method of claim 16, wherein the data attribute is an image type. 

23. The method of claim 16, wherein the data attribute is an image dimension. 

30 
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24. The method of claim 16, wherein the data attribute is meta-text with positioning 
information. 

25. The method of claim 1, further comprising the step of extracting indexing 
5 information from the attribute data in the data structure. 



26. The method of claim 25 further comprising the steps of: 
monitoring modifications to the document; and 
extracting updated indexing information. 

27. The method of ckiim 25, wherem the attribute data is derived fix>m a data field 
comprising raw text data. 



10 



28. The method of claim 25 fiirtber con^rising the step of: 

15 identifying the document from amongst other documents in the document 

collection utilizing the indexing information. 

29. The method of claim 1 further conq)rising the step of: 

linking the document to a first electronic;: folder if the attribute data matches a set 
20 of predefined criteria corresponding to the first electronic folder. 



-30: TlieTnethod^fclainr29-further 

electronically analyzing the attribute data stored in the data structure 
corresponding to the document; 
25 determining whether the document is to be automatically linked to the first 

electronic folder; and 

identifying the document on an inclusion list if it is detennined that the 
document is not automatically linked to the first electronic folder. 

30 31. The method of claim 29 further conq)rising the steps of: 
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electronically analyzing the attribute data stored in the data structure 
corresponding to the document; 

determining whether the document is to be automatically excluded from being 
linked to the first electronic folder; and 
5 identifying the document on an exclusion list if it is determined that the 

document is not to be automatically excluded from being linked to the first electronic 
folder. 

32. The niethod of claim 29 further conq)rising the steps of: 
10 monitoring document modifications; and 

automatically luddng the document to a second electronic folder if a document 
modification causes the attribute data to match a set of predefined criteria corresponding 
to the second electronic folder. 

IS 33 . The method of claim 29 further coiiq)rising the steps of: 



monitoring document modifications; and 

automatically deleting the link between the document and the first electronic^ 
folder if a document modification causes the attribute information to no longer match 
^ - the set of predefined criteria corresponding to the first electronic folder. 



20 

34. The method of clahn 29, wherem ttie attribute data is a document title. 



35. The method of claim 29, wherein the attribute data is a document author. 

25 36. The method of claim 29, wherein the attribute data is a phrase associated with 
the document. 

37. The method of claim 29, wherein the attribute data is a key word. 

30 38 . The method of claim 29, wherein the attribute data is a common concept. 
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39. The methcxl of claim 29 further conq>rising the step of: 

automatically manipulating the document based on a predefined behavior 
associated with the first electronic folder. 

5 40. The method of claim 39, wherein the predefined behavior is a user-defined 
behavior. 



41 . The method of claim 39, wherein the predefined behavior mvolves e-mailing the 
document to a preprogrammed e-mail address. 
10 42. The method of claun 39, wherein the predefined behavior involves providing 
controlled access to tfie document. 



43. The method of claim 1 furdier con^rising the steps of: 

linkmg the document to a folder, wherein the folder has associated with it a 
IS predefined behavior; and 

~ automatically manipulating the document in accordance with the predefined 
behavior. 

- • - 

44. ' The method pf claim 43, wherein the predefined behavior is a user-defined 
20 behavior. 

45t llie'metfaod'ofclaim-437-whereinihe-predefined1)d Bvi involves e-mail ingtte 

document to a pr^rogrammed e-mail address. 



25 46. The method of claim 43, wherein Oe predefined behavior involves providing 
. controlled access to the document. 

47. The method of claim 1 further comprising the step of: 

maintaining a second data structure that includes data defining a document 
30 hierarchy for the document collection. 
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48. The method of claim 47 further comprising the step of: 
updating the second data stnicture to include data that defines a li^ 

data structure of the imported document and a document hierarchy folder or category. 

49. The method of claim 47, wherein the second data structure includes data linking 
S all documents in the document collection to at least one folder or category. 

50. The method of claim 47 further comprising the step of: 

maintaining a third data structure that includes data defining a second document 
hierarchy for the document collection, or a portion thereof, wherein the third data 
10 structure.i^jnaintained at a local terminal connected to the computer system. 

51. A computer-readable storage medhmi havmg stored therein a program which 
executes flie steps of: 

inq>orting a document into a con^mter-based system; 
15 storing the document m memory; 

automatically extracting attribute data from the document; and 
generating a data structure corresponding to the document comprising the 

extracted attribute data m a standardized format regar^ess of document type or 

document format. ^ 

20 

52. The computer-readable storage medium in accordance with claim 5 1 , wherein 
$^ud^ro^am-furtherix>n4)rises^-exeeutable~^ 

. predefinmg category criteria for a first electronic folder; and 
linking the document to the first electronic folder if the attribute data in the data 
25 stmcture corresponding to the document matches the category criteria. 

53. The computer-readable storage medhmi m accordance with claim 52, wherein 
said program further conq)rises the executable steps of: 

electronically analyzing the attribute data stored in the data structure 
30 corresponding to the document; 
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coi!q)aring the attribute data to the predefined category criteria for the first 
electronic folder; 

determining whether the document is to be automatically linked to the first 
electronic folder based on the comparison; and 

identifying the document on an inclusion list if it is determined that the 
document is not to be automatically linked to the first electronic folder. 

54. The computer-readable storage medium in accordance with claim 52, wherein 
said program further comprises the executable steps of: 

electronically analyzing the attribute data stored in the data stnicture 
corresponding to the document; 

comparing the attribute data to the predefined category criteria for the first 
electronic folder; 

determining whether the document is to be automatically excluded firom being 
linked to the first electronic folder; and 

identifying the docimient on an exclusion list if it is determined that the 
document is not to be automatically excluded firom being linked to the first electronic ' 
folder. 1^ 

55. The conq>uter-readable storage medium in accordance with claim 52, wherein 
said executable step of predefining category criteria for the first electronic f^ 

--conq>rises^tfae-stq)s-of: :- 

storing a seed document in the first electronic folder; 

analyzing the seed document; and 

extracting the category criteria firom the seed document. 

56. The computer-readable storage medium in accordance with claim 58, wherein 
the predefined category criteria is based on user-defined criteria. 
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57. The computer-readable storage medium in accordance with claim 58, wherein 
said program further conq)rises the executable steps of: 

monitoring document modifications; and 

automatically linking the document to a second electronic folder if the attribute 
5 data now matches predefmed category criteria associated with the second electronic 
folder. 

58. The computer-readable storage medium in accordance with claim 52, wherein 
said program further coniprises the executable steps of: 

10 ' monitoring document modification; ai»l 

automatically deleting the link between the document and the first electronic 
folder if the attribute data no longer matches the predefined criteria associated with the 
fhrst electronic folder. 

15 59. Tte conqmter-readable storage medium in accordance with claim 52, wherein 
the attribute data is a document title. 

60. The computer-readable storage medium in accordance with claim 52, wherein 
the attribute data is a document author. 

20 

61 . The conqmter-readable storage medium in accordance with claim 52, wherein 
tiie-attribute-datans-a-phrase-associated-wittrdie-document:— 

62. The computer-readable storage medhmi m accordance with claim 52, wherein 
25 the attribute data is a common concept. 

63. The computer-readable storage medium in accordance with claim 52, wherein 
the attribute data is a key word. 
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64. The computer-readable storage medium in accordance with claim 51, wherein 
said program further comprises the executable steps of: 

linking the document with an electronic folder; and 
manipulating the document automatically based on a predefined behavior 
S associated with the electronic folder. 

65. The computer-readable storage medium in accordance with claim 64, wherein 
the predefined behavior is a user-defmed behavior. 

10 66. The conq>uter-readable storage medium in accordance with claun 64, wherein 
the predefined behavior involves e-mailing the document to a preprognunmed e-mail 
address. 

67. The con^uter-readable storage medium in accordance with claim 64, wherein 
15 the predefined behavior involves providing controlled access to the document. 

68. The coiqputer-readable storage medium in accordance with claim 51, wherein 
said stq) of importing a document into the computer-basi^ system comprises the 

. executable steps of: \, 
20 generating program instructions thus causing an optical scanner, connected to 

the con^uter system, to optically scan the docim^nt, wherein the document is a paper- 
based-documentpand \ : : : re- 
converting the optically scanned document into an electronic document. 

25 69. The computer-readable storage medium in accordance with claim 68, wherein 
the electronic document is an image file. 

70. The computer-readable storage medium in accordance with claim 68, wherein 
the electronic document is a text file. 

30 



-43- 



wo 99/18523 



PCTAJS98;20483 



71. The computer-teadable storage medium in accordance with claim 51, wherein 
said step of importing a document into the conq)uter syst^n conq>rises the executable 
step of: 

importing an electronic document. 

5 

72. The conq)uter-readable storage medium in accordance with claim 71, wherein 
the electronic document is a word processing document. 

73. The con^uter-readable storage medium in accordance method of claim 71, 
10 wherein the electronic documem is a document contamiiig an image. - 

74. The computer-readable storage medium in accordance method of claim 71, 
wherein the electronic document is an e-mail message. 

IS 75. The computer-readable storage medium in accordance method of clau^ 
wherein the electronic document is an HTML document. 

^ ^76. - The computer-readable storage medium in accordance with claim 51 , wherein 

. said program further comprises the executable step of : 
. 20 extracting indexing information from the attribute data in the data structure. 

77^ iniereomputerieadable-storage'medium4n-accordance^ 

said program further comprises the executable steps of: 
monitoring modifications to the document; and 
25 extracting updated indexing information. 

78. The computer-readable storage medium in accordance with claim 76, wherein 
the attribute data is derived from a data field in the data structure con^rising raw text 
data. 

30 
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79. The con^>uter-readable storage medium in accordance with claim 76, wherein 
said program further comprises the executable step of: 

identifymg the document from amongst other documents stored in the computer 
system utilizing the indexing information. 

5 

80. The conq)uter-readable storage medium in accordance with claim 51, wherein 
said program further comprises the executable step of: 

maintaining a second data structure that includes data defining a document 
hierarchy for the document collection. 
10 ' . 

81 . The computer-readable storage medium in accordance with claim 80, wherein 
said program further comprises the executable step of: 

updating the second data structure to include data that defines a link between the 
data structure of the imported document and a document hierarchy folder or category. 

15 

82. The con^uter-readable storage medium in accordance with claim 80, wherein 
the second data structure includes data linking all documents in the document collection 
to at least one folder or category. 

20 83 . The computer-readable storage medaun in accordance with claim 80, wherein 
said program furdier comprises the executable step of: 

::TT^naintaining-aihird-data-5tructure-lfaat4^ 

hierarchy for the document collection, or a portion thereof, wherein ttie third data 
structure is maintained at a local terminal connected to the computer system. 
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